AITopics | compute cluster

Collaborating Authors

compute cluster

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Distributed Stochastic Gradient Descent

Michael Teng, Frank Wood

Neural Information Processing SystemsFeb-13-2026, 13:56:32 GMT

We introduce Bayesian distributed stochastic gradient descent (BDSGD), a high-throughput algorithm for training deep neural networks on parallel computing clusters. This algorithm uses amortized inference in a deep generative model to perform joint posterior predictive inference of mini-batch gradient computation times in a compute cluster specific manner. Specifically, our algorithm mitigates the straggler effect in synchronous, gradient-based optimization by choosing an optimal cutoff beyond which mini-batch gradient messages from slow workers are ignored. The principle novel contribution and finding of this work goes beyond this by demonstrating that using the predicted run-times from a generative model of cluster worker performance improves over the static-cutoff prior art, leading to higher gradient computation throughput on large compute clusters. In our experiments we show that eagerly discarding the mini-batch gradient computations of stragglers not only increases throughput but sometimes also increases the overall rate of convergence as a function of wall-clock time by virtue of eliminating idleness.

artificial intelligence, machine learning, throughput, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Distributed Stochastic Gradient Descent

Michael Teng, Frank Wood

Neural Information Processing SystemsNov-20-2025, 18:02:02 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, throughput, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

fd78f2f65881c1c7ce47e26b040cf48f-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-19-2025, 22:08:24 GMT

artificial intelligence, configuration, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Meta builds world's largest AI superclusters for the future

FOX NewsJul-21-2025, 19:00:03 GMT

The CyberGuy Kurt Knutsson joins'Fox & Friends' to discuss the U.S.-Saudi investment summit and the debate over regulation as artificial intelligence continues to advance. What happens when one of the world's richest companies decides to go all-in on artificial intelligence? If you're Meta Platforms CEO Mark Zuckerberg, it means launching superclusters so large they could rival the footprint of Manhattan. Recently, Zuckerberg unveiled plans to invest "hundreds of billions of dollars" into next-generation AI infrastructure, including some of the largest compute clusters the world has ever seen. Meta's first supercluster, called Prometheus, is slated to go live in 2026.

large language model, machine learning, natural language, (18 more...)

FOX News

Industry:

Information Technology > Services (0.36)
Media > News (0.32)

Technology:

Information Technology > Communications > Social Media (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)

Add feedback

Why Europe's Efforts to Gain AI Autonomy Might Be Too Little Too Late

TIME - TechFeb-16-2024, 17:43:09 GMT

This week Microsoft announced that it would invest 3.2 billion ( 3.5 billion) in Germany over the next two years. The U.S. tech giant will use the money to double the capacity of its artificial intelligence and data center infrastructure in Germany and expand its training programmes, according to Microsoft vice chair and president Brad Smith. The move follows a similar announcement from November 2023, when Microsoft said it would invest 2.5 billion ( 3.2 billion) in infrastructure in the U.K. over the next three years. Both countries hailed the investments as significant steps that would permit them to compete on the world stage when it comes to AI. However, the investments are dwarfed by investments made by U.S.-based cloud service providers elsewhere, particularly in the U.S. As AI becomes increasingly economically and militarily important, governments are taking steps to ensure they have control over the technology that they depend on.

government, investment, microsoft, (15 more...)

TIME - Tech

Country:

Europe > Germany (0.84)
Europe > United Kingdom (0.28)
North America > United States (0.27)
(2 more...)

Industry:

Government (1.00)
Information Technology > Services (0.93)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Compute at Scale: A Broad Investigation into the Data Center Industry

Pilz, Konstantin, Heim, Lennart

arXiv.org Artificial IntelligenceNov-22-2023

This report characterizes the data center industry and its importance for AI development. Data centers are industrial facilities that efficiently provide compute at scale and thus constitute the engine rooms of today's digital economy. As large-scale AI training and inference become increasingly computationally expensive, they are dominantly executed from this designated infrastructure. Key features of data centers include large-scale compute clusters that require extensive cooling and consume large amounts of power, the need for fast connectivity both within the data center and to the internet, and an emphasis on security and reliability. The global industry is valued at approximately $250B and is expected to double over the next seven years. There are likely about 500 large (above 10 MW) data centers globally, with the US, Europe, and China constituting the most important markets. The report further covers important actors, business models, main inputs, and typical locations of data centers.

data center, hardware, infrastructure, (16 more...)

arXiv.org Artificial Intelligence

2311.02651

Country:

Asia > India (0.14)
North America > United States > Virginia (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(20 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
(2 more...)

Add feedback

Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces

Schneider, Tibor, Wang, Xiaying, Hersche, Michael, Cavigelli, Lukas, Benini, Luca

arXiv.org Artificial IntelligenceJan-16-2023

Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for long-term use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr.Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-V ISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64x and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0GMAC/s/W, it is 256x more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082GMAC/s/W).

artificial intelligence, implementation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/SMARTCOMP50058.2020.00065

2004.1169

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stability AI builds foundation models on Amazon SageMaker

#artificialintelligenceNov-30-2022, 21:01:22 GMT

We're thrilled to announce that Stability AI has selected AWS as its preferred cloud provider to power its state-of-the-art AI models for image, language, audio, video, and 3D content generation. Stability AI is a community-driven, open-source artificial intelligence (AI) company developing breakthrough technologies. With Amazon SageMaker, Stability AI will build AI models on compute clusters with thousands of GPU or AWS Trainium chips, reducing training time and cost by 58%. Stability AI will also collaborate with AWS to enable students, researchers, startups, and enterprises around the world to use its open-source tools and models. "Our mission at Stability AI is to build the foundation to activate humanity's potential through AI. AWS has been an integral partner in scaling our open-source foundation models across modalities, and we are delighted to bring these to SageMaker to enable tens of thousands of developers and millions of users to take advantage of them. We look forward to seeing the amazing things built on these models and helping our customers customize and scale their models and solutions."

amazon sagemaker, sagemaker, stability ai, (10 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Train ML models - Azure Machine Learning

#artificialintelligenceSep-28-2022, 03:40:22 GMT

Azure Machine Learning provides multiple ways to submit ML training jobs. In this article, you'll learn how to submit jobs using the following methods: SDK v2 is currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

information, service principal authentication, workspace, (12 more...)

#artificialintelligence

Industry: Information Technology (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automating model parallelism with just one line of code

#artificialintelligenceMay-25-2022, 01:24:53 GMT

Researchers from Google, Amazon Web Services, UC Berkeley, Shanghai Jiao Tong University, Duke University and Carnegie Mellon University have published a paper titled "Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning" at OSDI 2022. The paper introduces a new method for automating the complex process of parallelising a model with only one line of code. So how does Alpa work? Data parallelism is a technique where model weights are duplicated across accelerators while only partitioning and distributing the training data. The dataset is split into'N' parts in data parallelism with'N' being the quantity of GPUs.

gpus, model parallelism, parallelism, (16 more...)

#artificialintelligence

Country: Asia > China > Shanghai > Shanghai (0.25)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Add feedback